Efficient Paragraph based Chunking and Download Filtering for Plagiarism Source Retrieval

نویسندگان

  • Riya Ravi N
  • Deepa Gupta
چکیده

This paper describes the approach of the system that we built as part of the participation in ‘PAN 2015 Source Retrieval’ task. Chunking of documents based on paragraphs and efficient download filtering improved the overall performance of the system. Source Retrieval is an important task of a Plagiarism Detection system

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Experiments on Document Chunking and Query Formation for Plagiarism Source Retrieval

This paper presents the details of the system we prepare as a participant of the PAN 2014 task on 'Source Retrieval: Uncovering Plagiarism, Authorship, and Social Software Misuse'. Our work is focused on intelligent chunking of suspicious documents and a hybrid approach of query formation. A method based on term frequency and word co-occurrence is proposed to extract query terms from a non-over...

متن کامل

Source Retrieval and Text Alignment Corpus Construction for Plagiarism Detection

For the task of source retrieval, we focus on the process of Download Filtering. For the process from chunking to search control, we aim at high recall, and for the process of download filtering, we devote to improve precision. A vote-based approach and a classification-based approach are incorporated to filter the searching results to get the plagiarism sources. For the task of text alignment ...

متن کامل

Source Retrieval Plagiarism Detection based on Weighted Noun phrase and Key phrase Extraction

This paper describes an approach for source retrieval task of PAN 2015 competition. We apply two methods to extract important terms, namely weighted noun phrases and keyword phrases which are extracted from long sentences in terms of word count. Queries are constructed from top marked sentences. The prepared system tries to gather a complete dataset of downloaded sources and employ it in query ...

متن کامل

Using Sentence Similarity Measure for Plagiarism Source Retrieval

This paper describes a method that was implemented in the software submitted to PAN 2014 competition for the source retrieval task. For generating queries we use the most important noun phrases and words of sentences selected from a given suspicious document. To download documents that are likely to be sources of plagiarism we employ a sentence similarity measure.

متن کامل

Efficient Technique to Retrieve Plagiarized Documents for Plagiarism Detection

This paper details the approach of implementing an English plagiarism source retrieval system. A given document is broke down into segments by using TextTiling algorithm. These segments , are centered around certain topics within the document, key phrases are generated using KPMiner keyphrase extraction system. Segments and key phrases are used to create queries of the segment and document. Cha...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015